Medical care in the United States cost Americans an average of approximately 8,500 per person per year, the equivalent of 18% of US GDP annually. This is by far the most of any country in the world, both as a dollar ammount and as a percentage of GDP. Most of this cost comes from emergency medical care and scheduled care at hospital facilities.
Although our health care purchases are arguably the most important ones we will make as consumers, there is remarkably little information on how much different medical procedures cost and what we are receiving in exchange for our money. Supposedly higher quality of care is more expensive, but is this actually true? We used R and Tableau to explore these disparities in the cost of care and their relationship to care quality.
source("RPlots/ConnectToOracle.R", echo = TRUE)
##
## > library(shiny)
##
## > library(shinyapps)
##
## Attaching package: 'shinyapps'
##
## The following object is masked from 'package:shiny':
##
## hr
##
## > library(rJava)
##
## > library(DBI)
##
## > library(RJDBC)
##
## > library(ggplot2)
##
## > options(java.parameters = "-Xmx2g")
##
## > jdbcDriver <- JDBC(driverClass = "oracle.jdbc.OracleDriver",
## + classPath = "~/ojdbc7.jar")
##
## > con <- dbConnect(jdbcDriver, "jdbc:oracle:thin:@128.83.138.158:1521/pdborcl",
## + "ds_medicare", "orcl")
We ran these querries to Import Data into R from the ds_medicare database.
source("RPlots/SQLQuerries.R", echo = TRUE)
##
## > HCAHPSMeasure = dbGetQuery(con, "Select * From Measures")
##
## > InpatientServices <- dbGetQuery(con, "Select * from InpatientServices")
##
## > Providers = dbGetQuery(con, "Select * from Providers")
##
## > OutpatientServices <- dbGetQuery(con, "Select * from OutpatientServices")
##
## > OutpatientVisits <- dbGetQuery(con, "select * from Outpatient WHERE ID BETWEEN 0 and 30000")
##
## > OutpatientVisits = rbind(OutpatientVisits, dbGetQuery(con,
## + "select * from Outpatient WHERE ID BETWEEN 30001 and 60000"))
##
## > OutpatientVisits = rbind(OutpatientVisits, dbGetQuery(con,
## + "select * from Outpatient WHERE ID BETWEEN 60001 and 90000"))
##
## > OutpatientVisits = rbind(OutpatientVisits, dbGetQuery(con,
## + "select * from Outpatient WHERE ID BETWEEN 90001 and 100000"))
##
## > InpatientVisits <- dbGetQuery(con, "select * from Inpatient WHERE ID BETWEEN 0 and 30000")
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 30001 and 60000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 60001 and 90000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 90001 and 100000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 100001 and 130000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 130001 and 160000"))
##
## > outpatientCostByCity = dbGetQuery(con, "SELECT Providers.City as City, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByState = dbGetQuery(con, "SELECT Providers.State as State, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByHospital = dbGetQuery(con, "\n SELECT Providers.Name as Hospital, AVG(OutPatient.AverageSubmitt ..." ... [TRUNCATED]
##
## > outpatientCostByCity = dbGetQuery(con, "SELECT Providers.City as City, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByState = dbGetQuery(con, "SELECT Providers.State as State, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByHospital = dbGetQuery(con, "SELECT Providers.Name as Hospital, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > InpatientCostByCity = dbGetQuery(con, "SELECT Providers.City as City, AVG(InPatient.CoveredCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > InpatientCostByState = dbGetQuery(con, "SELECT Providers.State as State, AVG(Inpatient.CoveredCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > InpatientCostByHospital = dbGetQuery(con, "\n SELECT Providers.Name as Hospital, AVG(InPatient.CoveredCharges) a ..." ... [TRUNCATED]
##
## > PatientsRated9or10 = dbGetQuery(con, "\n Select Providers.Name, Reviews.AnswerPercent FROM Providers\n ..." ... [TRUNCATED]
##
## > PatientsRated9or10$ANSWERPERCENT <- as.numeric(PatientsRated9or10$ANSWERPERCENT)
##
## > CostVSRating = dbGetQuery(con, "\n Select Reviews.Answerpercent AS Rating, Reviews.SurveyID AS Question, \n ..." ... [TRUNCATED]
##
## > Rated9or10 = subset(CostVSRating, QUESTION == "H_HSP_RATING_9_10")
##
## > Rated7or8 = subset(CostVSRating, QUESTION == "H_HSP_RATING_7_8")
##
## > Rated0to6 = subset(CostVSRating, QUESTION == "H_HSP_RATING_0_6")
##
## > DefinitelyRecommend = subset(CostVSRating, QUESTION ==
## + "H_RECMND_DY")
##
## > ProbablyRecommend = subset(CostVSRating, QUESTION ==
## + "H_RECMND_PY")
##
## > NotRecommend = subset(CostVSRating, QUESTION == "H_RECMND_DN")
##
## > TexasQuery = subset(CostVSRating, STATE == "TX")
##
## > AustinQuery = subset(TexasQuery, REGION == "TX - Austin")
##
## > AverageCostBy910Rating <- aggregate(cbind(COST, INSUREDCOST) ~
## + PROCEDURE, Rated9or10, mean)
##
## > CheaperOutpatient = subset(AverageCostBy910Rating,
## + COST < 2000)
##
## > InpatientVisits$TOTALPAYMENTS <- as.numeric(InpatientVisits$TOTALPAYMENTS)
##
## > p <- subset(OutpatientVisits, APCID == 12)
##
## > p <- mean(p$AVERAGESUBMITTEDCHARGES)
##
## > TexasCostByProcedure <- aggregate(INSUREDCOST ~ PROCEDURE,
## + TexasQuery, mean)
source("RPlots/Plots2.R", echo = TRUE)
##
## > p3 <- hist(InpatientVisits$TOTALPAYMENTS, main = "Inpatient Procedure Cost",
## + xlab = "Average Ammount Billed Per Procedure", ylab = "# of Hosp ..." ... [TRUNCATED]
##
## > p4 <- hist(OutpatientVisits$AVERAGESUBMITTEDCHARGES,
## + main = "Outpatient Procedure Cost", xlab = "Average Amount Billed Per Procedure",
## + .... [TRUNCATED]
##
## > p5 <- hist(PatientsRated9or10$ANSWERPERCENT, main = "Patient Satisfaction \nRatings",
## + xlab = "Percent of Patients Who Rated \n Their Hospital ..." ... [TRUNCATED]
##
## > p1 <- ggplot(InpatientCostByState, aes(x = STATE,
## + y = AVGBILLEDCOST)) + geom_point() + coord_flip()
##
## > p1
##
## > p2 <- ggplot(outpatientCostByState, aes(x = STATE,
## + y = AVGBILLEDCOST)) + geom_point() + coord_flip()
##
## > p2
##
## > p7 <- ggplot(Rated9or10, aes(x = RATING, y = COST)) +
## + geom_point() + facet_wrap(~PROCEDURE)
##
## > p7
## Warning: Removed 10 rows containing missing values (geom_point).
## Warning: Removed 14 rows containing missing values (geom_point).
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning: Removed 10 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 10 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning: Removed 14 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 13 rows containing missing values (geom_point).
##
## > p8 <- ggplot(Rated9or10, aes(x = RATING, y = COST)) +
## + geom_point() + facet_wrap(~STATE)
##
## > p8
## Warning: Removed 10 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 16 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 36 rows containing missing values (geom_point).
## Warning: Removed 28 rows containing missing values (geom_point).
## Warning: Removed 11 rows containing missing values (geom_point).
##
## > p10 <- ggplot(TexasCostByProcedure, aes(x = PROCEDURE,
## + y = INSUREDCOST)) + geom_point() + coord_flip()
##
## > p10
##
## > p11 <- ggplot(TexasQuery, aes(x = RATING, y = COST)) +
## + geom_point() + facet_wrap(~PROCEDURE)
##
## > p11
## Warning: Removed 58 rows containing missing values (geom_point).
## Warning: Removed 58 rows containing missing values (geom_point).
## Warning: Removed 58 rows containing missing values (geom_point).
## Warning: Removed 87 rows containing missing values (geom_point).
## Warning: Removed 58 rows containing missing values (geom_point).
##
## > p12 <- ggplot(AustinQuery, aes(x = RATING, y = COST)) +
## + geom_point() + facet_wrap(~PROCEDURE)
##
## > p12
Even within Austin, there is a wide disparity in the cost of care, shown here for both insured and uninsured patients and by procedure:
We did find that higher cost was somewhat positively correlated with hospital volume (the number of patients seen):
And slightly more so nationally:
And although we found that higher cost also had a slight correlation with hospital quality locally:
…that correlation dropped off when the view was expanded to the national level:
And we can see by breaking the data down by procedure that that different procedure specializations between hospitals also aren’t the driving factor in cost disparity. In fact, the only thing positively cooralated here is the cost for insured patients to the cost for uninsured patients.
To find other possible correlations with high cost, we wanted to see if there were any states that stood out with higher costs in both Medicare reinbursements and patient copay costs and see if we could find any common factors among these high-cost states.
WY, SD, OR, NY, NJ, MA, ID, HI, DE, DC, CT, CA, AZ, AK
WY, UT, SD, NV, ID, HI, AK
WY, UT, SD, NV, ID, HI, AK
Intersection of the highest Medicare and Copay clusters yielded 4 states: Wyoming, Alaska, South Dakota, and Idaho. These states with high costs for insured patients tended towards voting Democratic and have low populations.
We also found that, possibly because of the high cost of medical care, there were a large number of people forgoing medical care because of cost.
And that the people who needed health care the most were the least likely to receive it.
They also tend to be poor, uninsured, and have poor mental health.
This was less true for those 65 and older, however, who were the least likely age group to forgo medical care because of cost.
Being over 62 was also a factor for falling into the group we found was most likely to receive care at 98.81%
However, those uninsured and 65 and over fell into the group of people least likely to receive care. Node 85 was the least likely group to receive care, at 82.65%.
Purely in terms of access and total cost, then, we can say that the Medicare program has been extremely successful in ensuring access to health care for the vulnerable population of seniors.
In our research, we also wanted to identify factors with high correlation to the existence of depression or other mental health disorders.
The dataset which was used included answers from a phone survey that took place in 2011-2012. Most questions related to behaviors resulting in leading causes of premature mortality and morbidity among adults. Over 139,000 responses were collected to 177 such questions.
18 attributes were chosen and extracted from the dataset as the basis of our examination in order to determine which factors have the highest correlation to mental disorders or depression. After all null values were eliminated from our selection, we had a remaining 108.544 rows.
| Attribute | Meaning |
|---|---|
| ADDEPEV2 | Ever told you had a depressive disorder? |
| AGE | Age |
| EDUCAG | Level of education (1-4) |
| EMPLOY | Employment Status (1-5) |
| EXERANY1 | Exercise in past 30 days |
| GENHLTH | Would you say that in general your health is? (1-5; Excellent-Poor) |
| HLTHPLN1 | Do you have any kind of health coverage? |
| INCOME2 | Income Level (1-8) |
| INCOMG | Computed Income Categories (1-5) |
| MEDCOST | Was there a time in the past 12 months, when you could not see a doctor because of the cost? |
| RFBING5 | Binge Drinking Calculated Variable |
| RFBMI5 | Overweight or obese calculated variable (BMI > 25.0) |
| RFDRHV4 | Heavy Alcohol Consumption Calculated Variable |
| RFHLTH | Adults with good or better health |
| RFSMOK3 | Adults who are currently smokers |
| SEX | Sex |
| VETERAN3 | Are you a veteran? |
Each feature created by NMF is a linear combination of the original attribute set. Each feature has a set of coefficients, which are a measure of the weight of each attribute on the feature. There is a separate coefficient for each numerical attribute and for each distinct value of each categorical attribute. The coefficients are all non-negative.